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Abstract 

Consider the following method of card shuffling. Start with a deck of N cards numbered 1 
through N. Fix a parameter p between and 1. In this model a "shuffle" consists of uniformly 
selecting a pair of adjacent cards and then nipping a coin that is heads with probability p. 
If the coin comes up heads then we arrange the two cards so that the lower numbered card 
comes before the higher numbered card. If the coin comes up tails then we arrange the cards 
with the higher numbered card first. In this paper we prove that for all p ^ 1/2, the mixing 
time of this card shuffling is 0(N 2 ), as conjectured by Diaconis and Ram 0. Our result is a 
rare case of an exact estimate for the convergence rate of the Metropolis algorithm. A novel 
feature of our proof is that the analysis of an infinite (asymmetric exclusion) process plays 
an essential role in bounding the mixing time of a finite process. 

1 Introduction 

The Metropolis algorithm is a widely used algorithm for sampling distributions on large finite 
sets. A variety of techniques were developed in order to analyze the convergence rate of the 
Metropolis algorithm (see fH), yet in many problems arising in applications we do not know 
how to estimate the convergence rate. In this paper we introduce new techniques which allow 
the analysis of a card shuffling Metropolis algorithm which was non-amenable to the standard 
techniques in the field. 

Card shuffling procedures provide a natural family of Markov chains which played a crucial 
role in the development of the theory of the convergence rate of Markov chains, see e.g. [Q, j5j, |(J. 
In this paper we analyze the mixing time of a biased card shuffling procedure which has a 
nonuniform stationary distribution. 
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1.1 The biased card shuffling chains 

As the biased card shuffling has a finite state space, we formulate it as discrete time Markov 
chain. 

Definition 1.1. For < p < 1, let CAd(N,p) denote the following discrete time Markov chain 
on permutations of N cards labeled 1,...,N. A step of the chain consists of selecting uniformly 
at random a pair of adjacent cards and then flipping a coin that is heads with probability p. If 
the coin comes up heads then we arrange the two cards so that the lower numbered card comes 
before the higher numbered card. If the coin comes up tails then we arrange the cards with the 
higher numbered card first. 

Note that if p = 1/2 then the stationary distribution for CAd(N,p) is the uniform distribution on 
Sn (the set of all permutations on N elements), but if p ^ 1/2 then the stationary distribution 
is not uniform. 

A novel feature of our results is that the heart of the proof of the mixing result for the finite 
card shuffling model involves the analysis of infinite processes on Z. Since infinite processes 
are naturally defined in continuous time, we use a continuous time version of the biased card 
shuffling model. 

Definition 1.2. For < p < 1, let CA(N,p) denote the following continuous time Markov chain 
on permutations of N cards labeled 1, . . . , N. Each pair of adjacent cards i, i + 1 is picked with 
rate 1 independently. Then we toss a coin which is heads with probability p. If the coin comes 
up heads then we arrange the two cards so that the lower numbered card comes before the higher 
numbered card. If the coin comes up tails then we arrange the cards with the higher numbered 
card first. 

Diaconis and Ram |?J were interested in the following slightly different chain: 

Definition 1.3. For 0.5 < p < 1, let q = 1 — p and let 9 = q/p. The Metropolis biased 

card shuffling is the following discrete time Markov chain on permutations of N cards labeled 
1, . . . , N. A step of the chain starts with selecting uniformly at random a pair of adjacent cards. 
If the two cards are arranged in a decreasing order, then we switch them. If they are arranged 
in an increasing order, then with probability 9 we switch them and with probability 1 — 9 we do 
nothing. 
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1.2 Main results 

The total-variation distance between measures \x and v on a finite space X is 

||A» - v\\tv = \ V] \K X ) ~ v{x)\ = sup \fj,(A) - v(A)\. 
2 ttx Acx 

Now we define the mixing time of a Markov chain a on a finite state space X. For any x £ X let 
xt be the distribution on X at time t under the action of a. The mixing time of the Markov 
chain is defined by 

t\ = inf{i : sup \\xt — %'t\\TV < e_1 }- 
x,x'ex 

Our main result is 

Theorem 1.4. For all p ^ 1/2 there exists a constant K = K(p) such that the mixing time of 
the discrete time biased card shuffling on N cards is at most KN 2 . 

Corollary 1.5. For all p > 1/2 there exists a constant K = K{p) such that the mixing time of 
the Metropolis biased card shuffling on N cards is at most KN 2 . 

Proof. The discrete time card shuffling is a slow down of the Metropolis card shuffling. More 
precisely, consider the following process: At every time, with probability 1—p we do nothing, and 
with probability p we do a Metropolis step. This process is the discrete time card shuffling. □ 

This verifies a conjecture of Diaconis and Ram jjj. A lower bound for the mixing time of the 
form N 2 is easy and well known. 

As the only difference between the two processes is that the continuous time process is U N—1 
times faster" the following result is equivalent. 

Theorem 1.6. For all p ^ 1/2 there exists a constant K = K{p) such that the mixing time of 
the continuous time biased card shuffling on N cards is at most KN . 



As our proofs are all done with continuous time processes, we will prove Theorem 1.6 and 



derive Theorem |l.4| as a corollary. 



1.3 Motivations and related results 

When running the Metropolis algorithm for sampling distribution on large finite sets, it is 
necessary to insure that the algorithm converges rapidly. Various techniques were developed 
in order to bound the convergence rate (see Subsection 1.5), yet in many cases none of these 
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methods apply. Such an example is the asymmetric card shuffling chain (see Subsection |1,5| ) 
which we analyze in a novel way in this paper. 

Our result also suggests an interesting comparison between the "systematic scan" and the 



"random scan" heuristics in sampling (see e.g. [10| or 



In [Q] Diaconis and Ram studied a different version of biased card shuffling. In their model 
the selection of the pair of adjacent cards was not random, but done in a prescribed deterministic 
manner ("systematic scan"). Their discrete time model, like ours ("random scan"), has a mixing 
time of 0(N 2 ) for p ^ 1/2. Our result may be interpreted as saying that the "systematic scan" 
doesn't give an improvement over the "random scan" . 

In H the authors introduce a model of computation where each comparison operation has 
probability p < 1/2 of returning the true result and probability 1 — p of returning a false 
result independently of other comparisons. The chain CAd(N, 1 — p) {CAd{N ,p)) is performing 
the randomized version of bubble sort in this noisy computation model. Our result shows 
the robustness to noise of the randomized bubble sort algorithm as the convergence time of 
CA d (N, 1 -p) is 0{N 2 ) for all p > 1/2. 

We would also like to remark that asymmetric exclusion processes, which are the key tool in 
our proof, also play a crucial role in the study of the quantum Heisenberg model. 

We conclude this subsection by discussing some of the history of card shuffling problems. 
Gilbert, Shannon and Reed began the mathematical study of card shuffling by introducing a 



good model for how people shuffle cards [11, 13]. The celebrated theorem of Bayer and Diaconis 



]|] states that for the Gilbert-Shannon-Reed model of card shuffling it takes seven shuffles in 
order for a standard 52 card deck to be well mixed. More generally, |J proved that for an N 
card deck the mixing time for the Gilbert-Shannon-Reed model is approximately | log 2 N. 

In the wake of Bayer and Diaconis's result there have been a number of articles analyzing 
the mixing time for various methods of card shuffling. Most relevant to this paper are results of 
Wilson as well as Diaconis and Ram. Wilson p^] found that the mixing time for CAd(N, 1/2) 
to within a factor of 2. The mixing time is of order iV 3 logiV. Note the sharp contrast with 
Theorem L4 where we show if p ^ 1/2 then the mixing time of CAd(N,p) is 0(N 2 ). 



1.4 The asymmetric exclusion process 

Most of the proof of our main result is devoted to analysis of the asymmetric exclusion processes. 



We now define these processes which are of independent interest, see e.g. [14, 15]. First we define 
our family of finite exclusion processes. The process £X(N,k,p) will be an exclusion process 
with N containers and k particles. 
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Definition 1.7. Let k and N be integers such that 1 < k < N, and < p < 1. Let £X(N, k,p) 
be the continuous time Markov process defined on 



*„, fe =j* €{0,1}^: E^ = fc | 



in the following way. Given the current state x, each pair of coordinates i, i + 1 of x is picked at 
rate 1. If Xi = Xj+i, then the chain will stay at state x; otherwise, the two coordinates i,i + l will 
be reassigned as (xj,Xj + i) = (1,0) with probability p, and as (xj,Xj + i) = (0,1) with probability 
1 — p. 

Definition 1.8. Let < p < 1. Let £X(Z,p) be the continuous time Markov process defined on 
{0, 1} Z in the following way. Given the current state x, each pair of coordinates i,i + 1 of x is 
picked at rate 1. If Xi = Xj+i, then the chain will stay at state x; otherwise, the two coordinates 
i,i + 1 will be reassigned as (xj,Xj + i) = (1,0) with probability p, and as (xj,Xj + i) = (0, 1) with 
probability 1 — p. 

We are particularly interested in the set 

A = } a : X^ 1 ~ a i) = ^2 a i < 

I -oo J 

There is a partial order on the set. We write a >z b if for all r 

r r 

E l ~ a ^ E i-fc- (!) 

i=— oo i=— oo 

The maximal state in A is the ground state 

GzW = ( 1 ' < ° (2) 
[ i > 

The aspect of asymmetric exclusion processes that we are most interested in is the tail of 
the hitting time. Given any x G A (or measure n on A) the hitting time, H(x) (or H([i)), is 
defined by 

H(x) = mi{t: x t = G I } (3) 
In particular we want to consider H(In) where 



1 i < -JV 

i €[-*,-!] 

1 i G [0, iV - 1] 
i > iV 
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Theorem 1.9. For all p > 1/2 and e > there exists a constant D = D(p,e) s.t. 

P(H(I N )<DN)>1--^. 



In Section || we show that Theorem 1.9 implies Theorem 1.6. Most of the work in this paper 
is in proving Theorem 11.91. 



1.5 Remarks on analytic techniques 

When applying standard analytic techniques to the study of the biased card shuffling problem 
we encounter several problems which prevent us from obtaining sharp bounds for the mixing 
time. We discuss briefly the difficulties in estimating the mixing time in terms of the spectral 
gap and the log Sobolev constant. The results of 0] also suggest that a group theoretic approach 
is hard to apply for solving this particular problem. 

The standard bound for the mixing time in terms of the spectral gap of the generator of the 



Markov process, — A2, (see [16| for background) yields 



2,1 

r<— log- --, 5 

— A2 mma tt(x) 

where ir is the stationary distribution. Moreover, combining our reduction from the card shuffling 
to the exclusion process with the bound on the spectral gap of the (continuous time) exclusion 
process in [Q] , it is straightforward to verify that there exists positive c\ and C2 such that indeed 
the spectral gap satisfies c\ < —\2{N) < C2 for all N. However, the probability space contains 
elements of very small probability, so the term log(l/ min^ vr(x)) is of order TV 2 (see p[| where 
the stationary distribution for the Metropolis chain is given). Thus (|5|) yields a bound of order 
N 2 . 

A standard way to reduce the dependency on the smallest probability, is to use the log 
Sobolev constant a instead of the spectral gap with the estimate 

4 , , 1 

t<- log + log — . (6) 

a mm x ir(x) 

However, plugging the indicator of the set which consists of a single element (N ... 1) in the 
variational formula of the log Sobolev constant (see |l6|]) implies that (for the continuous time 
model) a = 0(l/iV 2 ). (We use the notation f(N) = 0{N) (f(N) = Q(N)) if there exists a 
constant c < 00 (c > 0) such that f(N) < cN (f(N) > cN).) Thus (|) doesn't give the right 
bound of iV on the mixing time. 
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1.6 Road Map 

We conclude the introduction with an overview of the main steps of the proof. 



1. In Section g we show how Theorem 1.9 implies Theorems 1.6 and 1.4. The reduction 
follows [17] in using height functions, together with coupling arguments. The height 
functions provide a coupling of the biased card shuffling to the finite exclusion processes. 
This reduces the problem of bounding the mixing time for the biased card shuffling to 
bounding the tail of the hitting times of some finite exclusion process. Then we couple 
the finite exclusion processes with the infinite exclusion process. This reduces the problem 
to bounding the tail of H(I^). Now we can use some of the machinery developed for the 
study of exclusion processes on Z. 

2. In Section || we define the blocking measure \& on {0, 1} Z . We show that Vl/ is an invariant 
measure for £ X(Z,p). In Section || we will see how to bound the tail of H(In) in terms 
of the tail of H(V). 

3. In Section ^ we introduce an asymmetric exclusion process with second class particles. 
Second class particles are a common tool used in exclusion processes (see. e.g. jig] ). This 
will be our main tool for proving Theorem |0]. We discuss some of the processes related 
to the asymmetric exclusion process with second class particles. 

4. In Section || we use exclusion processes with second class particles to bound the tail of 
H(I]\r) in terms of the tail of H(^). Then we bound the tail of H(^f). This bound allows 



us to prove Theorem 1.9, which in turn allows us to bound the mixing time of the biased 
card shuffling process. 

2 Coupling card shuffling to exclusion processes 



In this section we show how Theorem 1.9 implies Theorem \l.6[ This reduces bounding the 



mixing time for the biased card shuffling to bounding the tail of H(In). Following [17], we 
use the following collection of height functions to map a permutation of a deck of cards to an 
exclusion process configuration. 

For any k, 1 < k < N, consider the map h}~ : Sn — ► Xjy^ defined by 

(hk(n))i = 

It is easy to see that 
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Claim 2.1. it is determined by (%(7r)) fc=1 . 

For 7r G S'at, we write 7r t for the random variable representing the value of the process at 
time t that starts at 7r and evolves according to CA(N, p). Similarly for x G Xjy^ (or {0, 1} Z ) we 
let the random variable Xt represent the configuration at time t for the exclusion process that 
started at x and evolves according to £X(N, k,p) (£X(Z,,p)). 

Claim 2.2. For all N and k, the processes h^i^t) are Markovian. Moreover, h^t) evolves 
according to the exclusion process £X(N,k,p). 

Throughout the paper we will define a number of couplings. The most important of these 
we refer to as canonical couplings. The idea of the canonical couplings is that we have a 
collection of initial conditions of permutations (or exclusion process states) and we use one set 
of clocks and one set of biased coin flips to update all of the process simultaneously. 

We begin by defining a coupling of the process CA(N, p) for all the configurations ir G Sn 
in the following way: A transition of shuffling is to be performed by choosing a pair of adjacent 
coordinates i, i + 1 at rate 1, and tossing a coin X which is heads with probability p. If X = H 
then we rearrange the cards in coordinates i, i + 1 in increasing order, while if X = T then we 
rearrange the cards in coordinates i, i + 1 in decreasing order. The same pair of coordinates 
i, i + 1 and the same coin X are chosen for all ir G S n simultaneously. We call this coupling the 
canonical coupling for CA(N, p). 

We can similarly define a coupling for all the configurations in £X(N, k,p). A transition is 
to be performed by choosing a pair of adjacent coordinates i, i + 1 in Z at rate 1 and tossing a 
coin X which is heads with probability p. For a state x of £X(Z,p) we will update x as follows. 
If Xi = Xi+i, then we do nothing. Otherwise, if X = H, we let Xi = 0,Xj + i = 1, and if X = T, 
we let Xi+i = 0, xi = 1. Again, the same pair of coordinates and the same coin is chosen for all 
states of £X(N,k,p). We call this coupling the canonical coupling for £X(N, k,p). In the 
same way we define a canonical coupling for £X(Z,p). 

It is immediate to verify that 

Claim 2.3. For all N,k andp, the canonical couplings for CA(N,P),£X(N,k,p) and £X(Z,p) 
are all well defined and have the right marginals. 

Moreover for all N, k and p, the map maps the canonical coupling of CA(N, p) to the 
canonical coupling of£X(N,k,p), i.e., if ' (TTt)n<=s N ,t>o evolves according to the canonical coupling 
for Sn, then the process 

({h k (7r t ) : vr G S N }) t > 
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has the same distribution as the process 

({a t : a E X N)k }) t > Q 
where (at) a &x N k ,t>o evolve according to the canonical coupling for £X(N,k,p). 

The analysis of the process £X(N,k,p) utilizes monotonicity properties of the canonical 
coupling. For a,b € X^,k we write a y b, if for all r, Y^i=i a i — Ya=i ^i- The maximal state 
with respect to this partial order is 



1 % < k 
i > k 



9N,k(i) = 

and the minimal state with respect to this partial order is 

m N)k (i) '■ 



i < k 

1 i > k 



We let H(N, k) be the hitting time of the state g^^ for the process £X(N, k,p) started at m^ t k- 
It is immediate to see that 

Claim 2.4. The canonical couplings for £X(N,k,p) and £X(Z,p) are monotone. That is, for 
both processes if x y y, then for all t it holds that xt y yt- 

Since <7j\r,fc and mN,k are the maximal and minimal elements with respect to y it follows that 

Claim 2.5. Under the canonical coupling for £X(N,k,p) it holds that 

P(3x,ye X N , k s.t. x t + y t ) = P(H(N,k) > t). 

Lemma 2.6. Under the canonical coupling for CA(N,p) it holds that 

N-l 

P(3a,r G S N s.t. a t ^ r t ) < ^ P(H(N,k) > t). 

k=l 

Proof. 

P(3a, reS N s.t. a t ± r t ) = P(31 < k < N - 1, a, r £ S N s.t. h k (a t ) + h k (r t )) (7) 

N-l N-l 

< P ( 3 ^ ^ X N,k s.t. x t ^y t )=Y J V[H(N,k) > t], (8) 

k=l k=l 

where ([?]) follows from Claim 2.1 and (||) follows from claim 2.3 and 2.5. □ 



The remaining coupling step is to couple the finite processes to an infinite process. 

Lemma 2.7. For all p > 1/2, all N ' , all 1 < k < N, and all t > the processes £X(N,k,p) 
and £X{7L,p) satisfy that 

P(H(N, k)>t)< P(H(I N ) > t). 

Proof. Consider the map -Xjv.fc — > {0, 1} Z sending x G Xj^,k to x with 

1 i < -k 

x(i) = < x(i + k + l) i G [—k, N — k — 1] (9) 
i>N-k 

V 

We will now couple £X(N,k,p) to the process £X(Z,p). More formally, we will couple the 
canonical coupling of £X(N, k,p) with the canonical coupling of £X(Z,p). 

For the process £X(7 J ,p) we pick a pair of coordinates i, i+1 at rate 1 and then use a coin X 
to rearrange the coordinates i, i + 1 in the usual manner. Now, ifl<i + /c + l<i + A; + 2<A r , 
then we use the same coin X to rearrange the coordinates i + k + l,i + k + 2 for the process 
£X(N,k,p). 

Clearly this coupling is well defined and has the right marginals. It is easy to see that if 
p > 1/2, then this coupling has the following important property: For all x £ X n ^ and all t > 0, 
it holds that (xt) ^ (x)t- 

Writing / = In, 9 = 9N,k, an d m = rriN,k, and since fh y I, it follows that under this 
coupling, if I t = G%, then 

(jth) h {fh)t hI t = G z . 

and therefore mt = g. The claim of the lemma follows. □ 

We end this section by noting that the canonical coupling of the biased card shuffling shows 
that with high probability by time DN all of the processes agree. 

Lemma 2.8. Theorem implies that the canonical coupling for CA(N,p) has 

P(3<7, T G Sn S.t. ODN / TDN) < £. (10) 

In particular Theorem implies theorems and 1. 



Proof. By Theorem |1.9| , P[H(In) > DN] < e/N, and therefore by Lemma 2.7 for all k it holds 
that P[H(N, k) > DN] < e/N. It now follows from Lemma |J that 

P(3(T,r G S N s.t. a DN ± t dn ) < (N - l)e/N < e, 

so we obtain (JlC|) . Taking e = e _1 , we deduce Theorem which immediately implies Theorem 

o □ 
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3 The blocking measure 

In this section we define a distribution iff on {0, 1} Z which is invariant under the action of 
£X(Z,p). iff is known as the blocking measure (see e.g. [0]). In Section || we will bound the 
tail of H(iff) and show that the tail of H(In) can be bounded in terms of the tail of H(iff). 
Fix any p > 1/2. Define \x = p,(p) on {0, 1} Z , to be the product measure with probabilities 



p J \ \ p 



Lemma 3.1. t l4j The measure \i is stationary for £Af(Z,p). 



Proof. This is proven on page 381 of H]. □ 

Notice that is supported on configurations rj s.t. there exists a s.t. ?y(i) = 1 for every 
i < —C„ and r)(i) = for every i > C„. There are only countably many configurations of this 
type and each of them has a positive measure. We have already defined 



A=\a: J> 



oo 

^ a,; < OO. 





Definition 3.2. TTie blocking measure iff on {0, 1} Z is defined by 

* = MU- (12) 
Corollary 3.3. ^ is stationary and ergodic for the exclusion process. 

By Poincare's recurrence theorem and the fact that iff(G%) > 0, we get the following lemma. 
Lemma 3.4. lim^oo P(iJ(#) > T) = 

In Lemma |5TTT| we will show that P(fT(*) > T) = e" n(v/ ^). 



Lemma 3.5. 

*(3 i>JV (77(i) = 1)) = *(3 i <_ JV (7 ? (t) = 0)) = O ((^) J . (13) 

For any T > 

P(3t G (T, T + JV) and i > 2iV such that n t {i) = 1) = e" n(iV) (14) 
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Proof. In the product measure, the fi probability that there exists an occupied site right of 
position N is bounded by 

^ ((i- P )/ P y < ^ fi-p-* 

Since is obtained from the product measure by conditioning on an event of positive probability 
the first part of the lemma is true. For the second part if there exists t G (T, T + N) and £ > 2N 
such that 7/t(i) = 1 then either 

1. there exists i > N such that rfr{i) = 1 or 

2. for some t 

max{i : rjt(i) = 1} — max{i : Tfr(i) = 1} > N 

By the first part of the lemma the probability of the first event is decreasing exponentially in 
N. The second event happens only if the right most particle moves to the right N times in a 
period of length N. As moves to the right happen with rate 1 — p < | the probability that this 
happens is also decreasing exponentially in N. □ 



4 Exclusion processes with second class particles 

The main tool that we will use in the rest of the paper is adding second class particles to our 
exclusion process. We now describe some of the basics about exclusion processes with second 
class particles. For a more rigorous treatment of exclusion processes with second class particles 
see @. 

Definition 4.1. Let < p < 1. Let EXi^L^p) be the continuous time Markov process defined 
on {0, 1, 2} z in the following way. Given the current state x, each pair of coordinates i, i + 1 of x 
is picked at rate 1. If Xi = Xj+i, then the chain will stay at state x. If the two coordinates i,i + l 
initially are (0,1) or (1,0) then they will be reassigned as (xj,Xj+i) = (1,0) with probability p, 
and as (xi,Xi + \) = (0,1) with probability 1 — p. If initially they are (0,2) or (2,0) then they 
will be reassigned as (2, 0) with probability p, and as (0, 2) with probability 1 — p. If initially 
they are (1, 2) or (2, 1) then they will be reassigned as (1, 2) with probability p, and as (2, 1) with 
probability 1 — p. 

If Xi = 1 then we say that there is a first class particle in position i, if Xi = 2 then we say 
that there is a second class particle in position i, and if Xj = 0, then we say that the site i is 
empty. 
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It is helpful to have in mind the following ordering of 0, 1 and 2. Particle 1 has priority over 
and 2 in moving the left. Particle 2 has priority over (but not over 1) in moving to the left. 
Particle of type 2 is therefore ranked in-between particle of type and particle of type 1. 

It is therefore natural to consider to the following two projections. In the first projection 
d 2 ^ 1 , 2's are projected to l's, while in the second projection (5 2 ~^°, 2's are projected to 0's. More 
formally, 



<5t(i)=0, 

1 «5*(i) > 0. 



and 



J 5 t (i)^l, 
{ 1 5t(i) = 1. 

Claim 4.2. Both <5 2->1 and 5 2 ~*° evolve according to £X(Z,p). 



The next process we consider represents the dynamics between particles of type 1 and par- 
ticles of type 2 and eliminates all the information on the 0's. 

To define Sf we first eliminate all of the zeroes from 5t, and then change all of the twos 
to zeroes. This is only well defined up to translation, so we must also decide which translate 
we want. We do this by tagging one particle in St and having 5f (0) correspond to the tagged 
particle. 

We now make this more formal. Let 



uo(0) 



sup{i : 5o(i) = 1} if sup{z : 5o(i) = 1} < oo, 
sup{i < : 5q(i) = 1} otherwise. 



We refer to the particle which is in position uq(0) at time as the tagged particle. Let u t (0) 
be the location of the tagged particle at time t. For n = 1, 2, let 

ut(n) = min{i > ut(n — 1) : St(i) > 0}, 

and 

Ut(—n) = max{i < u t (—n + 1) : 5t(i) > 0}. 
Thus u t (n) represents the location of the nth particle at time t. Let 

m, , f S t (u t (i)) = 2 

#(0= 1 , . 1 (15) 
I 1 5 t (u t (i)) = 1 
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Lemma 4.3. If 5 evolves according to SXi^L^p) and has initial distribution <5g stochastically 
dominating VP, then 5f stochastically dominates \& for all t. 



Proof. In 1 14] it is shown that for all i € Z, ^ is invariant under the Markov operator on {0, 1} 
which at rate 1 tosses a coin X which heads with probability p, then if X{ ^ Xj+i and X = H 
then Xi and Xj+i are updated as Xi = 1, Xi + \ = 0, if X{ ^ Xj+i and X = T, then Xj and x; L+ \ are 
updated as Xi = 0, Xj+i = 1. This implies that \I/ is an invariant measure for the process <5 K . It 
now immediately follows that if <5q stochastically dominates ^, then 5? stochastically dominates 
* for all t. □ 



5 Proof of Main Results 



Let {Yi}i £ z be i.i.d. random variables s.t. P(l^ = 0) = P(Yi = 1) = 1/2, and let Zj = 2Yj. The 



main tool to prove Theorem 1.9 is to study £X2{1>,p) with initial conditions 



1 i<-N 

i£[-N,-l] 

1 i G [0, JV — 1] 
Z f - i > N 



This is useful for proving Theorem |1.9| because of"* = (ijv)t- 
For any a G {0, 1} Z such that lim^-oo a; = 1 we set 



(16) 



L(a) = min{i : a, = 0}. 



(17) 



This indicates the left most empty position. The same way, for a s.t. linij_+ 00 a, = 0, we indicate 
the right most particle by 

R(a) = max{i : a,{ = 1}. 



For a constant C we define three events: 



AxiCN) = {Vt e (CW, (C + l)iV) Lia^ 1 ) > 2N}, 



(18) 



and 



A 2 (C, N) = {Vt € (CN, (C + l)iV) J2((jf ) < 2iV}, 



A 3 (C, N) = {3t E (CN, (C + 1)N) such that erf = G z }. 



(19) 



(20) 
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Lemma 5.1. 

P(H(I N ) < (C + 1)N) 
> 1 - P (A C 3 (C, N)\Ai (C, N),A 2 (C, AO) - P (A?(C, N)) - P (A£(C, N)) . 

Proof. Recall that by Gz we denote the ground state (equation (|2j)). Notice that of - * = 
if of = Gz and L(a^ 1 ) > 0. Thus if Ai(C,iV) and A 3 (C, N) both occur then there exists 
t < (C + 1)AT such that of^° = (I N ) t = G z . □ 

Although the previous lemma did not depend on the definition of A 2 (C, N), we use it because 
it is easy to bound 

P(A C 3 (C,N)\A 1 (C,N),A 2 (C,N)) 

in terms of the tail of H{^). 
Lemma 5.2. 

P(A%(C,N)\A 1 (C,N),A 2 (C,N)) < P(fl"(tf) > N). 

Proof. If Ai(C,N) and A 2 (C,N) both happen then cr K behaves according to £X(1i,p) con- 
ditioned on the event that there is never a particle to the right of 2N. By Lemma [D| the 
distribution of o-^, N stochastically dominates ^. Putting these two facts together gives us 
P(A C 3 (C,N)\A 1 {C,N),A 2 (C,N)) < P(H(V) > N). □ 

It is also not difficult to bound P (A|(C, N)). 
Lemma 5.3. P(A C 2 (C,N)) = e ~ n ( N \ 



Proof. This follows from Lemmas |3.5| and [4^. □ 

Our next goal is to bound P(Ai(C, N)). Then we will bound the tail of H(^). 
In order to bound P(Af(C, N)) we first bound P(Af (C, N)) where 

A 1 (C,N) = {L(a 2 c rf)>3N} 

and use the following lemma. 

Lemma 5.4. P(Af (C, JV)) < P(Af(C, AQ) + e~ n W. 

Proof. If A\{C, N) happens but Af(C, N) does not, then the leftmost container with out a 
particle moves to the left at least N times in time N. For that to happen, the clock left of this 
container has to ring A" times. But since its rate is smaller than 1, by simple large deviation 
estimates for Poisson variables, the probability of this happening is decreasing exponentially in 
N. □ 
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To bound the probability of A\{C,N) we study the process (3 which has initial distribution 

m4 Y : ' i£0 . (2D 




I.e. the initial distribution of (3 is the following: Every place contains a particle with proba- 
bility 1/2, and the places are independent of each other. Left of the origin the particles are first 
class particles, while right of the origin the particles are second class particles. 

We are interested in the processes {3 2 ^ and /3 2_+1 . The process (3 2 ^ 1 is the stationary i.i.d. 
process. The process /3 2 ^ , on the other hand, is the process that starts with no particles on 
the right half of the line , and an i.i.d. measure on its left half. 

Let x(t) be the location of the tagged particle in /3 2 ^ at time t and let x'(t) be the location 
of the tagged particle in f3 2 ~* 1 at time t. We will bound the expectation and variance of x(t). 



The following lemma follows from the proof of the shock wave phenomenon in [15]. For the 
convenience of the reader, we prove it here again. 

Lemma 5.5. There exists g < 1 such that under the canonical coupling, for every n and for 
every time t, 

P{\x{t) -x'(t)\ >n)< g n . 

Proof. We define the processes (3^ and (3 d (£ stands for locations and d stands for distances): 

f3t(i) is the location of the i-th. particle in /3 2— >1 . To be more precise, /3f(0) = x'(t), and, 
inductively, for positive i we take 

(3f(i) = min (j : j > - 1) and f3^ l (j) = l) 

and for negative i, equivalently, we take 

(3f(i) = max (j : j < (3 e t (i + 1) and f3^\j) = l) . 

f3?(i) is defined to be $(%) - (3f(i - 1). 

Since for every t, {A 2 ^ 1 («)} igZ is distributed according to the (1/2,1/2) product measure, 
we get that for every t, {/9^(^)} igZ are i.i.d. geometric variables with parameter 1/2. 

Let 

s(t) = sup{i : (3^{i) = 1}. 

Then for all t 

s(t) 

x{t)-x'(t) = £#(i). 

i=l 
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By Lemmas 3.5 and 4.3 



P(s(t) > n/3) = * (3< > n/3 : = 1) = O 



By the distribution of {/?f(i)} igZ there exists a < 1 such that 

'n/3 



1 — p 



n/3 



P =0(0. 



Thus there exists £? < 1 such that 



n/3 



P(Hi) -x'(t)\ >n)< P(a(t) > n/3) +P > n < £ n 



□ 



The following lemma is proved in [12|: 
Lemma 5.6 (Kipnis). 



t^CO t 



;(2p-l) 



Var(:z/(t)) , 6 



lim 



Remark: note that s < 0. 



Combining Lemma 5.6 and Lemma 5.5 and the fact that E(x(i)) and Var(x(i)) are contin- 
uous in t, we get 

Lemma 5.7. There exists to o,nd there exist s' < s < and i/ < w < oo s.£. /or every t > to, 

E(*(t)) 



< s. 



and 



Var(x(t)) 



Consider the exclusion process 7 which has the initial distribution 



7o(«) 



1 i < 
1 - Yi i>0 



(22) 
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One can couple a copy 7 ' of 7 with /3 2_>0 such that for all t and i 

7iw=i-r°H). 

x(t) corresponds to the location of the rightmost particle in (3^°, so the processes — x(t) and 
L(jt) have the same law. 

Applying Chebychev's inequality we get the following estimate: 

Lemma 5.8. For any 5 > and any t > to 

P(x(t)<(s + J?\t\ >l-6/t 



and 



P (K-/) > - (« + ) > 1 -'V/- 



For any i we define 7^ to be the process starting at 

V ° ( H i-y, 0-1 



We get from Lemma 5.S that for any f > to, 

P (l( 7 <) >-l-^ a + ^tj=P (Lfr t ) >-(^ + ^jtj>l- 5/t. (23) 

Now we are ready to bound P(Af). 
Lemma 5.9. For any e > 0, there exists a constant C = C(p, e) swc/j that for all N > [to] + 1 

P(^(C,iV))< A. 

Proof. As the canonical coupling preserves domination, if 

7 >r ao 2 -* and L(<$ > 3A 
then L(o~ 2 ^ ) > 3 A. This gives us that for any I and t 

P(L(a^°) > 3N) > 1 — P( 7o £ " P(^(V*) < 3A) (24) 

Choose j such that for all A 

■jN 



P < < e l 2N - 



18 



Then the lemma follows from equation ( |23| ) and equation (^4|) with l,5,C and t chosen such 
that l = jN, 5 = iv/s 2 , C > max(-2(3 + j)/s,2S/e,l), and t = CN. 
This is because 



P ^ Fi < N j < e/2iV 



and by Lemma 5.8 and equation 



P(L(^ C N N ) > 3N) = P(L( 1CN )>(3 + j)N) 
> P(L( 7CW ) > ~CN) 



> P(L( 1C n)>-[s + ^-)CN) 

> 5/CN 

> 1 - e/2N. 



Lemma 5.10. For every N > [to] + 1 and e > 0, 

P(H(I N ) <{C + l)N) > 1 - e/N - P(H(^) > N). 
Proof. This follows from Lemmas |5.1| f"T2| |5.3| , 5.4 , and 5^. 



□ 



(25) 



□ 



In order to prove Theorem 1.9 we first prove the following lemma: 
Lemma 5.11. 

P(#(¥) >N) = e ~ Q ^ 
Proof. For every N large enough, we wish to estimate the probability that H(^) > (C + 1)N 2 



where C is the constant from Lemma pAO, We take N large enough so that the probability in 



Ba) is bigger than n- Such iV exists by Lemma pM Recall that 



In® 



1 i<-N 

i€[-JV,-l] 

1 i€[0,iV-l] 
i > N 

Now, for every j = 0, 1,2, N, let Pj = P(JT(*) > (C + l)iVi). Of course, P = 1- Now, we 
proceed inductively. Let 

U N = l-P(rj t tI N )- 
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Notice that by Lemma O it does not depend on t, and, by (13), 



P(ff($) > t + (C + l)N\ Vt h In) < \- 



For every t, 

Therefore, Pi < Ap- + Un for every i > 0. Therefore, 

P(#(tf ) > (C + 1)N 2 ) = P N < 2~ N + Z'^N = e- nW . 



N 



i=l 



By monotonicity we can interpolate and get that for every t 

P(#($) > (C + l)i) = e - n (v / *) 



and thus 



P(£T(¥) >t) = e 



□ 



We can now prove our main results. 



Proof of Theorems [y|, |T|, and |7J. By Lemma = e" ^ = o(A^" 1 ). Therefore, 

by©, 

P(iJ(Jiv) < (C(p, e) + 1)N) > 1 - ^ - o^" 1 ). 



Taking D = C(p, |) + 1 we get Theorem L£ is satisfied for all arbitrarily large N. Thus we can 
choose D so that it is true for all N. Theorems 1.4 and 1.6 follow by Lemma 2.6. □ 



We conclude the paper with a brief comment about how D = D(p) depends on p. We see 
that in Lemma 5.6 that s = — ^(2p — 1) and v < 2 p~\ • ^ OT l ar § e N , i n Lemma |5.7| using e = 1/e 
we can choose 



_ 8ve < 102 lr 



For large iV we can choose 



D = 2C < 



(2p- I) 3 ' 



2048e 



(2p- l) 3 ' 

It is easy to show that -D must be chosen bigger than l/(2p — 1). The discrepancy in the power 
of 2p — 1 comes from the use of Chebychev's inequality in Lemma |5.8| . We believe that a more 
careful analysis would allow one to choose D s.t. D = 9{\/{2p — 1)). 
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